Data Release: Developmental NeuroToxicity Data Integration and Visualization
Enabling Resource (DNT-DIVER)
--------------------------------------------------------------------------------

Phase 1 data release: 2025-06-12

To cite this dataset: TBD

## Data Analysis Pipeline

The DNT-DIVER Phase 1 release includes 9 assays/datasets: hNP1 proliferation
assay, hNP1 apoptosis assay, human NCC migration assay, rat neurite outgrowth
assay, human neurite outgrowth assay, rat cortical synaptogenesis and
maturation assay, rat network formation assay, human neurosphere assay and
zebrafish larval behavior assay. A total of 115 chemicals were tested in Phase
1, however, this release only contains data for the 95 non-botanical chemicals.
All chemicals were tested in at least four concentrations (including plate
vehicle control) except in the human NCC migration assay, in which a tiered
screening was conducted, where 33 chemicals were tested at a single
concentration.

Endpoints within each dataset were processed to generate the raw response
values that often showed a directional (increased/decreased) effect with a
tendency towards a monotonic, concentration-dependent relationship. When
applicable, the responses for each endpoint on each plate were normalized using
the vehicle control responses on the respective plate (median response on a
plate) and were shifted so that the baseline response was 0. Therefore,
response > 0 (< 0) means that the response increased (decreased) after chemical
exposure compared with responses in the vehicle control wells. In the current
release all endpoints were normalized accordingly as described above except the
relative area-under-the-curve (AUC) endpoints in zebrafish larval behavior
assay. Both raw and normalized response values are available in the data
release.

Concentration-response data for all endpoints were processed using the
benchmark concentration (BMC) approach implemented in the R package Rcurvep
(https://cran.r-project.org/package=Rcurvep). The BMC approach generates a BMC
value, which is the potency at which the response is equivalent to an
endpoint-specific benchmark response (BMR). Two approaches were used to derive
BMRs: a) the lowest response threshold at which the variance in potency
estimation is sufficiently reduced, and b) the median value plus 3 median
absolute deviations (MAD) of the responses from the vehicle control across the
plates. The a) approach, described in the publication by Hsieh et al. (2019),
was implemented in the Rcurvep package. This approach is implemented by
simulating 100 concentration-response curves for each chemical using a
bootstrap approach and investigating the pooled variance of potency at
candidate thresholds (from 5% to 95% with 5% increments).  In the current
release, only the relative area-under-the-curve (AUC) endpoints in zebrafish
larval behavior assay used the b) approach. For BMR values used for specific
endpoints, please refer to the included BMR file.

Two methods were used to process the concentration-response data to generate
BMC values: one parametric approach (Hill model equation) and one
non-parametric approach (Curvep, model-free). Both methods are implemented in
the Rcurvep package. In the current release, only the output from Curvep was
used in the data analysis for the zebrafish assay. Chemicals with >= 4
concentrations were processed. The BMC data from these chemical-endpoint pairs
should be treated with caution. The BMC results for both Hill and Curvep
approach are available in the data release. Details on method differences in
processing the data are explained below.

1) Hill: The Hill model is a commonly used model in dose-response modeling. The
Hill model was fitted to the data using a version of the algorithm in the tcpl
R package (Filer et al. 2017). An adjustment was applied to the original
function to avoid unrealistically low values of the AC50 in the Hill model: the
AC50 was required to be no lower than 0.01 times the lowest nonzero
concentration.  95% confidence limits on BMC were calculated using a bootstrap
procedure and the quantile approach, with bootstrap data generated by adding
error (generated by sampling from the residuals of the Hill model fit) to the
fitted value of the model. The resampling was done 1000 times to generate a
bootstrap distribution for the BMC value. An activity confidence score was
calculated as the number of simulated concentration-response data (i.e.,
bootstrap data generated by adding residuals) having a BMC in the tested
concentration range. An active call was defined as having activity confidence
score > 0.5. An additional requirement that the top asymptote of the Hill model
must be at least 0.8 * BMR was also applied.

2) Curvep: Curvep is a response noise filtering algorithm designed for
quantitative high throughput screening assay (Sedykh 2016; Sedykh et al. 2011).
For each chemical, to generate 95% confidence interval around the BMC and an
activity confidence score for each endpoint, the bootstrap approach was used,
where simulated concentration-response data (n=1000, 1:1 concentration-response
data) were created by bootstrapping responses per concentration. The data were
processed using Curvep and curves with "INVERSE" flag were set as inactive then
the quantile approach was used to derive the confidence interval. An active
call was defined as having activity confidence score > 0.5.

In addition to the activity, chemical selectivity is also considered.
Selectivity is defined as a neuro or developmental specific effect occurring in
the absence of general toxicity (i.e., mortality or loss of cell viability).
The selectivity ratio is defined as the ratio of BMC value of the general
toxicity to the BMC value of the neuro or developmental specific effect. If
inactive in the general toxicity, the highest tested concentration was set as
the BMC.

## Benchmark Response (BMR)

Benchmark_Response_Table.csv columns:

- provider: where assay was run
- dataset: assay (9 unique values)
- endpoint: name of endpoint (76 unique values)
- direction: curve fitting direction for corresponding BMR value (-1 down, 1
  up)
- sd_well: standard deviation of normalized responses in all vehicle control
  wells across plates
- mean_well: mean of normalized responses in all vehicle control wells across
  plates
- n_out_well: number of data points (normalized responses) identified as
  outliers by the Tukey outlier test with degree = 3
- n_well: total number of all vehicle control wells across plates
- sd_in_well: standard deviation of normalized responses in all vehicle control
  wells across plates after removing the outliers
- mean_in_well: mean of normalized responses in all vehicle control wells
  across plates after removing the outliers
- percent_out: percent of normalized responses in all vehicle control wells
  across plates identified as outliers by the Tukey outlier test with degree =
  3
- outlier_test_used: if the Tukey outlier test is used to determine the
  standard deviation of normalized responses in all vehicle control wells
  across plates (note: >5% of outliers, the outlier test was not used)
- sd_well_used: standard deviation of normalized responses in all vehicle
  control wells across plates that was reported
- mean_well_used: mean of normalized responses in all vehicle control wells
  across plates that was reported
- BMR_method: method to select BMR
- BMR: benchmark response

## Datasets

The data export currently contains three datasets, each of which are stored in
Tab Separated Value (TSV) format:

1. Well Response - individual well-response data. Please note that each well
may have multiple measurements, and therefore there is not a 1:1 mapping of
unique wells to responses. Data were normalized as described in Behl et al.,
2018.  2. BMC CurveP - Benchmark concentration (BMC) results of modeling using
the non-parametric CurveP method.  Detailed description on the method applied
is described in Behl et al., 2018; a discussion on how benchmark response (BMR)
values is detailed in Hsieh et al., 2018.  3. BMC Hill - Benchmark
concentration (BMC) results of modeling using the parametric Hill method using
the tcpl package.  Detailed description on the method applied is described in
Behl et al., 2018; a discussion on how benchmark response values (BMR) values
is detailed in Hsieh et al., 2019.

well_response.tsv columns:

- provider: where assay was run
- dataset: assay (9 unique values)
- endpoint: name of endpoint (76 unique values)
- endpoint category: category of endpoint for summarization
- is viability: is endpoint measuring overall viability or specific outcome
- chemical name: chemical name
- chemical casrn: chemical CASRN
- chemical category: NTP chemical category
- chemical lot: chemical LOT
- plate id: unique plate identifier for provider + dataset
- plate row: row of measurement on plate (or null)
- plate column: column of measurement on plate (or null)
- well type: S - sample, VC - vehicle control, PC - assay positive control
  UKonstanz NCC migration specific: BG - background, BGBrdU - BrdU background,
  LC - lysis control
- concentration: concentration of chemical for exposure
- concentration unit: typically uM for samples and % for DMSO VC
- response raw: unprocessed raw response value (or null)
- response normalized: normalized response value in NTP analysis (or null)

bmc_curvep.tsv columns:

- provider: where assay was run
- dataset: assay (9 unique values)
- endpoint: name of endpoint (76 unique values)
- endpoint category: category of endpoint for summarization
- is viability: is endpoint measuring overall viability or specific outcome
- chemical name: chemical name
- chemical casrn: chemical CASRN
- chemical category: NTP chemical category
- bmr: benchmark response value where bmd is calculated
- bmd: benchmark dose (uM)
- bmdl: benchmark dose lower-confidence limit (uM)
- bmdu: benchmark dose upper-confidence limit (uM)
- wauc: weighted area under the curve
- emax: maximum concentration
- doses: comma separated list of input doses (uM)
- responses: comma separated list of input response
- is increasing: boolean flag for positive/negative response test
- is active: boolean flag for if a bmd was calculated that passed requirements
- selectivity ratio: selectivity ratio for non-viability/viability for
  endpoint, not log10 scaled
- has viability bmd: boolean flag if selectivity ratio was calculated with
  viability bmc or maximum tested dose

bmc_hill.tsv columns:

- provider: where assay was run
- dataset: assay (9 unique values)
- endpoint: name of endpoint (76 unique values)
- endpoint category: category of endpoint for summarization
- is viability: is endpoint measuring overall viability or specific outcome
- chemical name: chemical name
- chemical casrn: chemical CASRN
- chemical category: NTP chemical category
- bmr: benchmark response value where bmd is calculated
- bmd: benchmark dose (uM)
- bmdl: benchmark dose lower-confidence limit (uM)
- bmdu: benchmark dose upper-confidence limit (uM)
- param_vmax: vmax parameter in hill fit (see tcpl documentation for more
  details)
- param_k: k parameter in hill fit (see tcpl documentation for more details)
- param_n: n parameter in hill fit (see tcpl documentation for more details)
- is increasing: boolean flag for positive/negative response test
- is active: boolean flag for if a bmd was calculated that passed requirements
- selectivity ratio: selectivity ratio for non-viability/viability for
  endpoint, not log10 scaled
- has viability bmd: boolean flag if selectivity ratio was calculated with
  viability bmc or maximum tested dose

## References:

Behl M, Ryan K, Hsieh JH, Parham F, Shapiro AJ, Collins BJ, et al. 2019.
Screening for Developmental Neurotoxicity at the National Toxicology Program:
The Future Is Here. Toxicol Sci; Jan 1;167(1):6-14. doi: 10.1093/toxsci/kfy278.

Filer DL, Kothiya P, Setzer RW, Judson RW, Martin MT. 2017.  tcpl: the ToxCast
pipeline for high-throughput screening data. Bioinformatics; doi:
10.1093/bioinformatics/btw680

Hsieh J-H, Ryan K, Sedykh A, Lin J-A, Shapiro AJ, Parham F, et al. 2019.
Application of benchmark concentration (BMC) analysis on zebrafish data – a new
perspective for quantifying toxicity in alternative animal models. Toxicol Sci;
doi: 10.1093/toxsci/kfy258.

Sedykh A. 2016. CurveP Method for Rendering High-Throughput Screening
Dose-Response Data into Digital Fingerprints. In: High-Throughput Screening
Assays in Toxicology (H. Zhu and M. Xia, eds). Methods in Molecular Biology.
Springer New York:New York, NY. 135-141.

Sedykh A, Zhu H, Tang H, Zhang L, Richard A, Rusyn I, et al. 2011. Use of in
vitro HTS-derived concentration-response data as biological descriptors
improves the accuracy of QSAR models of in vivo toxicity. Environ Health
Perspect 119:364-370; doi: 10.1289/ehp.1002476.


